Goto

Collaborating Authors

 sum value


Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders

Kuznetsov, Kristian, Kushnareva, Laida, Druzhinina, Polina, Razzhigaev, Anton, Voznyuk, Anastasia, Piontkovskaya, Irina, Burnaev, Evgeny, Barannikov, Serguei

arXiv.org Artificial Intelligence

Artificial Text Detection (ATD) is becoming increasingly important with the rise of advanced Large Language Models (LLMs). Despite numerous efforts, no single algorithm performs consistently well across different types of unseen text or guarantees effective generalization to new LLMs. Interpretability plays a crucial role in achieving this goal. In this study, we enhance ATD interpretability by using Sparse Autoencoders (SAE) to extract features from Gemma-2-2b residual stream. We identify both interpretable and efficient features, analyzing their semantics and relevance through domain- and model-specific statistics, a steering approach, and manual or LLM-based interpretation. Our methods offer valuable insights into how texts from various models differ from human-written content. We show that modern LLMs have a distinct writing style, especially in information-dense domains, even though they can produce human-like outputs with personalized prompts.


AI Sustainability in Practice Part One: Foundations for Sustainable AI Projects

Leslie, David, Rincon, Cami, Briggs, Morgan, Perini, Antonella, Jayadeva, Smera, Borda, Ann, Bennett, SJ, Burr, Christopher, Aitken, Mhairi, Katell, Michael, Fischer, Claudia, Wong, Janis, Garcia, Ismael Kherroubi

arXiv.org Artificial Intelligence

Sustainable AI projects are continuously responsive to the transformative effects as well as short-, medium-, and long-term impacts on individuals and society that the design, development, and deployment of AI technologies may have. Projects, which centre AI Sustainability, ensure that values-led, collaborative, and anticipatory reflection both guides the assessment of potential social and ethical impacts and steers responsible innovation practices. This workbook is the first part of a pair that provides the concepts and tools needed to put AI Sustainability into practice. It introduces the SUM Values, which help AI project teams to assess the potential societal impacts and ethical permissibility of their projects. It then presents a Stakeholder Engagement Process (SEP), which provides tools to facilitate proportionate engagement of and input from stakeholders with an emphasis on equitable and meaningful participation and positionality awareness.


LDEB -- Label Digitization with Emotion Binarization and Machine Learning for Emotion Recognition in Conversational Dialogues

Dey, Amitabha, Suthaharan, Shan

arXiv.org Artificial Intelligence

The development of an automated system for emotion recognition in conversations (ERC) is beneficial to many conversational AI applications, [Hazarika et al., 2021, Bhat et al., 2021]. The recent language model ChatGPT in the domain of conversational AI has shown the usefulness of an automated system for ERC, [Shahriar and Hayawi, 2023, Zhang et al., 2023]. Such a system can help advance research in many disciplines that include computational linguistics, neuroscience, and psychology, [Canales and Martínez-Barco, 2014, Strapparava and Mihalcea, 2008]. There has been a significant effort to understand the emotions in conversations and develop efficient computational techniques and machine learning classifiers for ERC using the information in conversational dialogues, [Huang et al., 2018, 2019]. For example, [Huang et al., 2018]-assuming that the textual information in a dialogue does not deliver sufficient information-proposed an approach to supply emotion information a priori at training. Subsequently, [Huang et al., 2019] have also utilized the Long Short Term Memory networks (LSTM) architecture hierarchically-as an iterative model-to capture contextual emotional features so that the model can predict the emotions in textual dialogues. Machine learning (ML) is a technique that can help us develop such an automated system to recognize emotions in a conversational dialogue by performing the classification of emotions. For example, [Binali et al., 2010] have adapted emotion theories, based on Ekman's model and the OCC (Ortony/Clore/Collins) model, and developed a support vector machine (SVM) classifier for emotion recognition in a web blog data.